Conditional screening for ultrahigh-dimensional survival data in case-cohort studies

The case-cohort design has been widely used to reduce the cost of covariate measurements in large cohort studies. In many such studies, the number of covariates is very large, and the goal of the research is to identify active covariates which have great influence on response. Since the introduction of sure independence screening (SIS), screening procedures have achieved great success in terms of effectively reducing the dimensionality and identifying active covariates. However, commonly used screening methods are based on marginal correlation or its variants, they may fail to identify hidden active variables which are jointly important but are weakly correlated with the response. Moreover, these screening methods are mainly proposed for data under the simple random sampling and can not be directly applied to case-cohort data. In this paper, we consider the ultrahigh-dimensional survival data under the case-cohort design, and propose a conditional screening method by incorporating some important prior known information of active variables. This method can effectively detect hidden active variables. Furthermore, it possesses the sure screening property under some mild regularity conditions and does not require any complicated numerical optimization. We evaluate the finite sample performance of the proposed method via extensive simulation studies and further illustrate the new approach through a real data set from patients with breast cancer.


Introduction
In large epidemiological cohort studies, it is common that some diseases of interest (e.g., cancer, heart disease, HIV infection) have very low incidence. In addition, some exposures can be very expensive to measure and it is not feasible to obtain the measures on all cohort members due to restrictions on resources. To reduce the cost while keeping as much efficiency as possible, Prentice (1986) proposed the case-cohort design, where the expensive covariates are obtained only for a random sample of the full cohort, called the subcohort, as well as the additional cases who have experienced the event of interest during the follow-up period. When covariate dimension p is smaller than sample size n, various methods have been proposed for analyzing data under this design, such as the pseudo-likelihood approach (Prentice, 1986;Self and Prentice, 1988;Kalbfleisch and Lawless, 1988), the estimating equation method (Chen and Lo, 1999;Chen, 2001), the multiple imputation approach (Marti and Chavance, 2011;Keogh and White, 2013), the maximum likelihood estimation (Scheike and Martinussen, 2004;Zeng and Lin, 2014), weighted estimating equation approach (Barlow, 1994;Borgan et al., 2000;Kulich and Lin, 2004;Breslow and Wellner, 2007;Kang and Cai, 2009;Kim et al., 2013), among others.
With the rapid development of biomedical technology, high-dimensional data are frequently collected in large epidemiological studies. The feature of this kind of data is that the covariate dimension p is much larger than sample size n. An important purpose of analyzing this type of data is to identify a subset of covariates related to the event of interest and construct the effective models based on the selected covariates. For scenarios where p increases with n at polynomial rate (e.g., p = n α with α > 0), the regularization method has been demonstrated to be an effective dimension reduction method for simple random sampling (SRS) data (e.g., Tibshirani, 1996;Fan and Li, 2001;Zou, 2006;Candes and Tao, 2007;Zhang, 2010) and has been generalized to high-dimensional data under the casecohort design recently. For example, Ni et al. (2016) proposed a variable selection procedure by using the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) for scenarios where p increases at a slower rate than n. Kim and Ahn (2019) proposed a bi-level variable selection method to select non-zero group and within-group variables for cases where variables have group structure. These methods can select variables and estimate parameters simultaneously, however, the computation inherent in regularization methods makes them involve the simultaneous challenges of computational expediency, statistical accuracy, and algorithmic stability when the dimension p is ultrahigh in the sense that p = exp(n α ) with α > 0 (Fan et al., 2009).
For SRS data, the feature screening method has achieved great success in dealing with the challenge of ultrahigh-dimensional settings. Various marginal screening methods have been proposed under different settings, such as linear models (Fan and Lv, 2008), generalized linear models (Fan and Song, 2010), additive models (Fan et al., 2011), the varying coefficient models (Fan et al., 2014;Liu et al., 2014) and model-free scenarios (e.g., Zhu et al., 2011;Li et al., 2012a;Li et al., 2012b;He et al., 2013;Chang et al., 2013;Cui et al., 2015;Mai and Zou, 2015;Wu and Yin, 2015). For censored survival data, several modelbased screening methods (e.g., Tibshirani, 2009;Zhao and Li, 2012;Gorst-Rasmussen and Scheike, 2013) and model-free screening methods (e.g., Song et al., 2014;Wu and Yin, 2015;Zhang et al., 2017;Zhou and Zhu, 2017;Zhang et al., 2018;Lin et al., 2018;Pan et al., 2019) have been proposed via defining different marginal utilities. Although they are powerful in reducing the dimensionality, they may face some challenges in some situations. For instance, as noted in Fan and Lv (2008), the correlation among covariates heavily influence the marginal utility. When the correlation among covariates is relatively high, the marginal screening methods may fail to retain the hidden active variables which have great influence on response but are weakly correlated with the response. Although some iterative screening methods (e.g., Fan and Lv, 2008;Zhu et al., 2011;Zhang et al., 2018, Pan et al., 2019 and forward screening approaches (e.g., Wang, 2009) have been proposed to alleviate this problem, the computation speed is relatively slow and the statistical properties are elusive.
In many applications, researchers can obtain some prior information of active variables from previous investigations and experiences. For example, in the breast cancer study (van de Vijver et al., 2002), gene AL080059 has been known to be predictive to patients' survival time in the literature (Yeung et al., 2005;van't Veer et al., 2002). Barut et al. (2016) pointed out we can improve the accuracy in variable screening by including such prior knowledge. In view of this thought, they proposed the conditional screening approach for generalized linear models and showed that conditioning helps reducing the correlation among covariates, thus can detect the hidden active variables with higher probability. Hong et al. (2016) further proposed to integrate prior information using data-driven approaches. Hu and Lin (2017) put forward a conditional screening procedure via ranking covariates based on conditional marginal empirical likelihood ratios. Liu and Wang (2017) proposed a screening method based on conditional distance correlation. Hong et al. (2018) developed a conditional screening method for censored data under the proportional hazards model. Liu and Chen (2018) considered the conditional quantile independence screening approach for ultrahighdimensional heterogeneous data. Lu and Lin (2020) proposed a model-free conditional screening via conditional distance correlation. Extensive simulation studies showed these conditional screening methods which incorporate important prior information of active variables can provide a powerful means to identify hidden active variables for ultrahighdimensional data.
The research on marginal and conditional screening methods has been fruitful for ultrahighdimensional SRS data, but to the best of our knowledge, conditional screening method has not been studied for case-cohort data, the existing conditional screening methods can not be directly applied to the case-cohort data due to its special data structure. To fill the gap, we propose a conditional screening method for ultrahigh-dimensional case-cohort data under the framework of Cox proportional hazards model. We construct the marginal hazards regression models for each covariate by including the known important covariates. As some covariates are not fully observed, we build the weighted estimating equation to obtain the estimators of the parameters. Then we propose the marginal utilities based on the parameter estimates to measure the contribution of each covariate and retain the covariates with top ranked contributions. We refer to it as conditional weighted screening method, in short the C-WSIS procedure. As pointed out by Barut et al. (2016), the correlation between covariates can be weakened upon conditioning, so that hidden active covariates have a higher chance to be retained. Therefore, the proposed method enables the detection of hidden active covariates for ultrahigh dimensional survival data under the case-cohort design. Under some reasonable conditions, it enjoys the sure screening property and the ranking consistency. Our research is the first one that focus on conditional screening for ultrahigh dimensional casecohort data, it can be viewed as an extension of Hong et al. (2018) from SRS data to casecohort data. Note that although the ideas are similar, the generalization is quite challenging due to the much more complex structure of case-cohort data, both implementation and the theory will be quite different.
The rest of the article is organized as follows. In Section 2, we introduce the model, data and present the details of the CWSIS procedure. In Section 3, we establish the theoretical properties of the proposed CWSIS method. Section 4 presents results from simulation studies. A real data set from the breast cancer study is analyzed in Section 5. Section 6 provides some remarks and discussions. The regularity conditions and the technical proofs are presented in the Appendix.

Conditional screening for case-cohort data
Suppose there are n independent subjects in a cohort study. Let T i and C i denote the failure time and censoring time of subject i, we only observe X i = min(T i , C i ) and Δ i = I(T i ≤ C i ) due to right-censoring. Let Z i = (Z i1 ,…,Z ip ) T denote the p-dimensional covariate, under the case-cohort design, Z i is available only on the cases (Δ i = 1) and the subcohort (a random subset of the full cohort). Let ξ i be the indicator for subcohort membership, i.e., ξ i = 1 and 0 denote whether or not the ith subject in the full cohort is selected into the subcohort. For the selection of subcohort, we consider independent Bernoulli sampling with selection probability π = Pr(ξ i = 1) ∈ (0, 1). Thus, the observable data for the ith subject is {X i , Δ i , Z i , ξ i } when ξ i = 1 or Δ i = 1, and {X i , Δ i , ξ i } when ξ i = 0 and Δ i = 0.
Suppose that the failure time follows the proportional hazards model (Cox, 1972), under which the conditional hazard function of T i given Z i has the form where λ 0 (t) is the unspecified baseline hazard function and α = (α 1 ,…,α p ) T is the unknown regression parameter. Assume that the failure time T i and the censoring time C i are independent given Z i . In an ultrahigh-dimensional setting, the dimensionality p greatly exceeds sample size n and can be allowed to increase at an exponential rate of n. Under the sparsity principle, only a small number of covariates have great influence on the response variable, i.e., ‖α‖ is much smaller than p, where ‖α‖ denotes the number of nonzero elements of α. Assume we have the prior information that a set of covariates are related to survival time T and the index set is denoted by , q = | | denotes the number of covariates in C. Write Z i, = (Z i, j , j ∈ ), Z i, − = (Z i, j , j ∉ ), α = (α j , j ∈ ) and α − = (α j , j ∉ ).
Here, is known, α and α − are unknown. The true hazard function in (1) is equivalent to (2) Let − = { j ∉ : α j ≠ 0} and a = | − | = ∑ j ∉ I(α j ≠ 0) be the true set of non-zero coefficients and its cardinality. Our goal is to recover the set − as precisely as possible based on data from case-cohort studies. In other words, we want to find a subset of covariates − which satisfies − ⊆ − .
To perform an initial screening procedure, we construct the marginal Cox regression models for each covariate individually, here we also add the known covariates in to each marginal model. Specifically, for j ∉ the hazard function of T i given (Z i, , Z i,j ) has the form where λ j,0 (t) is the unspecified baseline hazard function, and β , j and β j are the unknown regression parameters corresponding to covariates Z and Z j in the marginal Cox model, respectively. Since the covariates can only be observed for the selected subcohort and cases for case-cohort data, we consider the following weighted estimating equation is a consistent estimator of the true sampling probability π. Note that w i (η) weights the ith subject by the inverse probability of selection, it equals to 1 for the cases and π(t) −1 for the sampled censored subjects. The maximum marginal pseudo-partial likelihood estimator (β , j , β j ) is defined as the solution to the weighted estimating equation U j (β , j , β j ) = 0 q + 1 . Define the information matrix be the variance estimate of β j , i.e., the (p + 1)th diagonal element which serves as the proposed utility measure for the jth covariate. We rank covariates Z j ( j∉ ) by the value of M , j from the largest to smallest and retain those at the top of the rank list. For a given threshold γ > 0, the selected index set in addition to set is given by In practical applications, we can pre-determine a positive integer d 0 and define the estimated active set as Similar to Fan and Lv (2008) and other literature related to feature screening, we can choose d 0 = ⌈n cc /log n cc ⌉, where n cc denotes the case-cohort sample size.
Similar to the conditional screening procedures of Barut et al. (2016) and Hong et al. (2018), the outstanding advantage of the proposed CWSIS procedure is that it enables the detection of hidden active covariates for ultrahigh dimensional case-cohort data. To demonstrate this merit, we set up an example in a similar way to Barut et al. (2016) and Hong et al. (2018). In particular, the failure time T i follows the Cox proportional hazards model (σ ij ) p×p , σ ii = 1 for i = 1,…,p, σ ij = 0.5 for i ≠ j. By this design, Z 5 is a hidden active covariate. We consider four different conditioning sets, = {∅}, {1}, {1, 2}, {6, 7, 8}. The densities of the proposed screening statistic M , j for Z 5 (hidden active covariate) and Z 6 , …, Z 2000 (inactive covariates) are summarized in Figure 1. When = ∅, CWSIS is equivalent to the marginal screening approach, the value of M , j for Z 5 is much smaller than the corresponding value of inactive covariates with a high probability. When the conditioning set includes one truly active covariate ( = {1}), the curve for Z 5 is on the right and there is a clear separation between these two curves. When we include more truly active covariates ( = {1, 2}), this separation becomes larger. We note a very interesting phenomenon that when the conditioning set consists of three inactive covariates ( = {6, 7, 8}), the chance of identifying the hidden variable Z 5 using CWSIS is still higher than the marginal screening method. This may be due to the correlation between them and the active covariates, such inactive variables can effectively function as surrogates for the active variables, thus conditioning on them can help detect hidden variables. A similar phenomenon was also observed in Barut et al. (2016) and Hong et al. (2018).

Theoretical property
In this section, we show the CWSIS procedure enjoys the sure screening property and the ranking consistency property, which demonstrate that our CWSIS procedure tends to rank the active covariates above the inactive ones with high probability, furthermore, all the active covariates survive after screening with probability tending to 1 as n → ∞. These two properties lay out the theoretical foundation of our CWSIS procedure. Define The regularity conditions are given in Appendix A, under which we establish the following lemmas and theorems.
Lemma 2 Suppose conditions C1-C8 hold, there exist constants c 2 > 0 and 0 < κ < 1/2 such that Lemma 3 Under conditions C1-C8, for any ϵ 1 > 0 and ϵ 2 > 0, there exist positive constant c 3 and integer N such that for any n > N and 0 < κ < 1/2, where a is the size of − , q is the size of , c 2 is the same value in lemma 2.
Lemma 3 shows that the proposed maximum marginal pseudo-partial likelihood estimate β j is a consistent estimate of β j 0 . By lemmas 1 and 3, we indeed can distinguish Z j ( j ∈ − ) from Z j ( j ∉ − ) by the proposed marginal utility M , j . Theorem 1 states the sure independent screening property of the CWSIS procedure.
Theorem 1 (The sure screening property) Under conditions C1-C8, for any 0 < κ < 1/2 and ϵ 2 > 0, there exists positive constant c 3 such that where a is the size of − , q is the size of . Furthermore, we have From this theorem, we can see that all active covariates survive after screening with a probability tending to one. The next theorem establishes the ranking consistency property of the proposed method.
Theorem 2 (The ranking consistency) Under conditions C1-C8, we have This lays out the theoretical foundation that our procedure ensures active covariates be ranked prior to the inactive ones with overwhelming probability. The proof of theorems and these lemmas are presented in the Appendix B.

Simulation studies
We examine the finite sample performance of the proposed CWSIS procedure and make comparisons with some existing methods via simulation studies. For brevity, we refer to the feature aberration at survival times screening procedure of Gorst-Rasmussen and Scheike (2013) as FAST-SIS, the principled sure independent screening procedure of Zhao and Li (2012) as P-SIS, the censored rank independence screening of Song et al. (2014) as CRIS. Furthermore, we consider the marginal weighted screening procedure (MWSIS), where we for each Z ij and construct the weighted estimating equation to obtain the estimate β j , then define the active index set as 1/2 ≥ γ}, I j (β j ) denotes the information matrix. As the PSIS, FAST and CRIS can only deal with the SRS data, we generate the SRS data with the same sample size as the case-cohort data for PSIS, FAST and CRIS.
We consider the survival data generated from the Cox proportional hazards model and employ the independent Bernoulli sampling to generate the subcohort. We consider full cohort sample size n = 500, 1000, and the number of covariates p = 2000, 4000. As the incidence rate for case-cohort studies is usually very low or moderate, we consider the failure rate of 20% for n = 500, 5% and 10% for n = 1000. We consider the noncase-to-case ratio of 1 : 1, thus the sample size of the case-cohort data in our simulation studies equals to 100, 200. For each configuration, we repeat 500 simulations and employ three evaluation criteria (Li et al., 2012b). The first one is the minimum model size to include all active predictors, denoted by . We present the median and interquartile range (IQR) of out of 500 replications. The second one is the selection proportion that each important variable is selected into the model with a given model size d 0 , denoted by e . The third one is the selection proportion that all important variables are selected into the model with a given model size d 0 , denoted by a . An effective screening procedure is expected to yield close to the true minimum model size and both e and a close to one. Here, we choose d 0 = ⌈n cc /log n cc ⌉ (Fan and Lv, 2008), n cc is the case-cohort sample size and ⌈x⌉ denotes the integer part of x.

Example 1.
T i are generated from the Cox proportional hazards model The censoring time C i ~ Unif(0, τ), the constant τ represents the end time of the study and is used to control the failure rate.
We compute the absolute correlation between the survival time T and each covariate Z j (j = 1, …, p) for p = 2000 through the inverse probability weighting scheme and further summarize the marginal correlation in three groups: the active covariates (Z 1 , …, Z 4 for example 1 and Z 1 for example 2), the hidden active covariates (Z 5 for example 1 and Z p for example 2), and the inactive covariates (Z 6 , …, Z p for example 1 and Z 2 , …, Z (p−1) for example 2). Figures 2 and 3 depict the distribution of the absolute correlation for these three groups, from which we can see the marginal signal strength of hidden active covariates are weaker than the inactive covariates. Therefore, the marginal screening methods MWSIS, PSIS, FAST and CRIS are difficult to identify the hidden active covariates. The proposed conditional screening method CWSIS is an ideal alternative. In our simulations, we simply choose Z 1 as the conditional covariate. In practice, if we have no useful prior information about active covariates, we can choose those covariates which have higher marginal signal strength as the conditional set (Barut et al., 2016;Lu and Lin, 2020). To have a fair comparison, we add one (the number of conditional covariate in our examples) to for the proposed conditional screening method CWSIS.
The simulation results for , e and a are summarized in Tables 1-2. By observing the values of e for Z 5 in example 1 and Z p in example 2, we can conclude that the proposed CWSIS procedure can detect the hidden active covariates with high probabilities, while the other four methods MWSIS, PSIS, FAST and CRIS fail to select them. In example 2, ρ equals to 0, 0.3, and 0.7, with a larger ρ yielding a higher collinearity. The proposed method CWSIS performs well even with high collinearity, while the other four methods do not behave well even when ρ = 0 and the performance deteriorates with the increasing value of ρ. As expected, CWSIS needs a smaller model size to possess the sure screening property in all settings. Larger case-cohort sample size and higher failure rate are associated with better performance. In particular, larger cohort sample size can handle rare disease situations better.
To assess the performance of the proposed method in the settings that are similar to the real data, we further consider n = 300 and the failure rate of 25% for example 2, the remaining setups are kept the same as before. Here, we also consider the unweighted conditional screening method NCWSIS which does not adopt the weight function and simply treat the case-cohort data as SRS data, and the conditional screening method C-SMPLE in Hong et al. (2018). Since the method C-SMPLE in Hong et al. (2018) is proposed for SRS data, it can not be directly used to handle the case-cohort data, we generate the SRS data with the same sample size as the case-cohort data for CSMPLE. The simulation results for , e and a are summarized in Table 3, from which we can see that the proposed method can detect the hidden active covariates with high probabilities and delivers its distinctive advantages for all the considered settings. By comparing the results of NCWSIS, CSMPLE and CWSIS, we can conclude that the performance of the conditional screening method is improved by including the case-cohort weight. Moreover, the proposed conditional screening procedure based on case-cohort design is more accurate in selecting the active covariates than the conditional screening based on a SRS of the same size as the case-cohort sample. For example, when p = 2000 and ρ = 0.7, the value of a is only 0.460 for CSMPLE, while the corresponding value of the proposed method CWSIS equals to 1.

Application to breast cancer data
As an illustration, we apply the proposed CWSIS method to the breast cancer data ( (2002), we use these 60 samples as the testing set and the case-cohort samples as our training set. The details of these two sets are summarized in Table 4. The interest of the study is to identify genes that have great influence on patients' overall survival rate.
We illustrate the proposed method by identifying genes that have great influence on patients' overall survival rate based on data from a case-cohort sample. Specifically, we select the subcohort by independent Bernoulli sampling with the selection probability π = 0.37, which results in about the same number of cases and noncases. The subcohort has 111 subjects and the final case-cohort sample has 155 subjects. Gene AL080059 has been known to be predictive to patients' survival time in the literature (Yeung et al., 2005;van't Veer et al., 2002), we use it as the conditional variable in the proposed procedure. The screening methods are usually considered as an initial step to reduce the dimensionality and then followed with some model-based regularization methods. In particular, we first apply the proposed CWSIS procedure to reduce the dimension from p = 4919 to ⌈155/log(155)⌉ = 31 and then utilize different regularization methods LASSO, SCAD and MCP to select the significant ones among these 31 genes under the framework of the Cox proportional hazards regression, the tuning parameter was selected by the 10-fold cross-validation. We summarize the name and the corresponding estimated value of the coefficient for selected genes in Table  5, from which we can see that genes Contig58368.RC, NM.014889, NM.005689, NM. 013290, AL080059, NM.013332, Contig63649.RC and NM.002916 were all selected by the LASSO, SCAD and MCP methods, indicating that these eight genes could be associated with patients' survival rate. Moreover, genes Contig58368.RC, NM.014889 and NM.005689 were ranked at the first three position, which means that these three genes may have great influence on patients' survival rate.
To evaluate the predictive accuracy of C-WSIS, we further compute the C-statistic estimator (Uno et al., 2011). For comparison, we also apply the MWSIS and NC-WSIS procedures to analyze this data. In particular, we first apply these three screening methods to reduce the dimension to ⌈155/log(155)⌉ = 31, then perform the LASSO penalization to further remove some irrelevant covariates, with the tuning parameter selected by the 10-fold crossvalidation. We obtain the risk score for each subject by using the final model selected by LASSO and further compute the corresponding concordance statistic (C-statistic) (Uno et al., 2011) in the testing set. The standard deviations (SD) of C-statistic are obtained from perturbation resampling 1000 times. The corresponding values of C-statistic and SD (the values in the parenthesis) are 0.862 (0.059), 0.796 (0.078), 0.802 (0.053) for CWSIS, MWSIS, NCWSIS procedures, respectively. According to Uno et al. (2011), the larger the C-statistic is, the stronger predictive power the method possesses. We can conclude that the proposed CWSIS method performs reasonably well for ultrahigh-dimensional survival data under the case-cohort design and delivers a favorable performance in terms of prediction.
We also consider d n = n/2, n/3, n/4 when analyzing this data and summarize the results in the supplementary material, from which we can see that the selected genes under different cutoffs are highly consistent. Furthermore, we compute the C-statistic estimator for CWSIS, MWSIS, NCWSIS procedures under these three cases. From the results in the supplementary material we can make similar conclusion to that with d n = n/log(n).

Conclusion
For ultrahigh-dimensional survival data under the case-cohort design, we propose a conditional screening procedure CWSIS by incorporating the prior information of active covariates. This method enables the detection of hidden active covariates, which is an outstanding advantage compared with the marginal screening procedures. Moreover, the proposed procedure does not require any complicated numerical optimization and is computationally efficient. Theoretically, it enjoys the sure screening property and ranking consistency property under some mild regularity conditions. In the development of the theoretical properties, we adopt the conditional linear expectation and conditional linear covariance, which are proposed in Hong et al. (2018) and are useful to specify the regularity conditions.
There are some issues that deserve further considerations. First, the proposed method requires the prior information of active covariates, sometimes it may be difficult to obtain such useful information. Hong et al. (2016) proposed a data-driven method to obtain the conditional set for generalized linear models. How to develop a data-driven conditional screening method for survival data under the case-cohort is an interesting question. Furthermore, when we have prior knowledge of active covariates, how to balance it with the information extracted from the given data merits further investigation. Second, under our design, the subcohort is selected by independent Bernoulli sampling. When the subcohort is selected by simple random sampling without replacement, our method also works, although more complicated arguments would be needed to develop the theoretical properties. Moreover, when some covariates are available for all cohort members, we can consider the stratified case-cohort design based on those covariates. Third, we can consider to propose more efficient screening methods which incorporating more complex prior knowledge, such as the network structure or the spatial information of the covariates.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
C4. All Z j , j ∈ − are independent of all Z j , j ∉ − given Z .
Conditions C1 and C2 are common assumptions in survival analysis (Andersen and Gill, 1982;Fleming and Harrington, 1991). Condition C3 assumes the covariates are bounded, similar condition also used in Hong et al. (2018). Condition C4 is similar to the partial orthogonality assumption of the covariates. Condition C5 controls the total effect size of the covariates, it is reasonable under the sparsity principle. Condition C6 is a typical assumption which has been widely used in the literature of feature screening, such as condition 3 in Fan and Lv (2008), condition 2 in Li et al. (2012b), condition 2 in Song et al. (2014), conditions 2 and 5 in Wu and Yin (2015), etc. Condition C7 is a mild assumption which holds in many situations. Condition C8 is a common assumption on the case-cohort design.

Appendix B: lemmas and theoretic proofs
Let β , 0 be the solution of the equations u As a preparation, we first introduce some lemmas.
This lemma is extracted from lemma A3 of Ni et al. (2016).
Lemma 6 For independent random variables Y 1 , …, Y n with bounded ranges [−M, M] and zero mean, This lemma is extracted from lemma 2.2.9 of van der Vaart and Wellner (1996).

4.
This lemma is extracted from proposition 2 of Hong et al. (2018).

Lemma 8
The conditional linear covariance has the following properties:

3.
For any increasing function h(·) : R → R and random variable ξ : Ω → R, we This lemma is extracted from proposition 3 of Hong et al. (2018).

Proof of Lemma 1
Proof We first relate β j , then by condition C6, we relate it to α j . For any j ∉ and k ∈ , straightforward calculations entail that By the definition, we have By the definition of (β , j When α j ≠ 0, by condition C6, we have are both nonzero and have the same signs since they are equal. Specifically, P(δ = 1|Z) is the probability of occurrence of the event and S T S C = P(X > t|Z) represents the probability at risk at time t. For any t, we have By lemma 8, Cov* {Z j , P(δ = 1|Z)|Z } and Cov*(Z j , S T S C |Z ) have the opposite signs unless they are zero. This further implies that and E[Cov*{Z j , P(δ = 1|Z)|Z }] have opposite signs unless they are equal to zero. So F 2 j (β , 0 , 0) ≠ F 2 j (β , j 0 , β j 0 ), therefore, β j 0 ≠ 0.

Proof of Lemma 2
Proof By lemma 1, for any j ∈ , we have β j 0 ≠ 0. By Taylor expansion, there exists By the proof of lemma 1, v j (β , j By condition C3, |Z j | ≤ L 0 , then sup By the proof in lemma 1, F 2 j (β , j 0 , 0) and E[Cov * {Z j , P(δ = 1 | Z) | Z }] have opposite signs, which completes the proof.

Proof of Theorem 2
Proof For any j ∈ − , we have α j ≠ 0. From lemma 1, we know that |β j 0 | > 0. Similarity, we have |β j 0 | > 0. Similarly, we have |β j 0 | = 0 if j ∈ − . As β j is a consistent estimator of β j 0 and M , j = |β j |/σ j , we can easily conclude that P(max j ∉ − M , j < min j ∈ − M , j ) 1 when n → ∞, which completes the proof of theorem 2. noncase-to-case ratio is 1 : 1, the failure rate equals to 20%.    Table 5 The results of selected important genes for the breast cancer data using the regularization methods

Name
Est. Name Est. Name Est.